{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MiniRocket\n",
    "\n",
    "MiniRocket transforms input time series using a small, fixed set of convolutional kernels.  MiniRocket uses PPV pooling to compute a single feature for each of the resulting feature maps (i.e., the proportion of positive values). The transformed features are used to train a linear classifier.\n",
    "\n",
    "Dempster A, Schmidt DF, Webb GI (2020) MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification [arXiv:2012.08791](https://arxiv.org/abs/2012.08791)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 Univariate Time Series\n",
    "\n",
    "### 1.1 Imports\n",
    "\n",
    "Import example data, MiniRocket, `RidgeClassifierCV` (scikit-learn), and NumPy.\n",
    "\n",
    "**Note**: MiniRocket and MiniRocketMultivariate are compiled by Numba on import.  The compiled functions are cached, so this should only happen once (i.e., the first time you import MiniRocket or MiniRocketMultivariate)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:03.214929Z",
     "iopub.status.busy": "2020-10-12T17:43:03.214184Z",
     "iopub.status.idle": "2020-10-12T17:43:03.216304Z",
     "shell.execute_reply": "2020-10-12T17:43:03.216990Z"
    }
   },
   "outputs": [],
   "source": [
    "# !pip install --upgrade numba"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.linear_model import RidgeClassifierCV\n",
    "from sklearn.pipeline import make_pipeline\n",
    "\n",
    "from sktime.datasets import load_arrow_head  # univariate dataset\n",
    "from sktime.datasets.base import load_basic_motions  # multivariate dataset\n",
    "from sktime.transformations.panel.rocket import MiniRocket, MiniRocketMultivariate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Load the Training Data\n",
    "\n",
    "For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.743652Z",
     "iopub.status.busy": "2020-10-12T17:43:08.741410Z",
     "iopub.status.idle": "2020-10-12T17:43:08.749009Z",
     "shell.execute_reply": "2020-10-12T17:43:08.749629Z"
    }
   },
   "outputs": [],
   "source": [
    "X_train, y_train = load_arrow_head(split=\"train\", return_X_y=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Initialise MiniRocket and Transform the Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.753121Z",
     "iopub.status.busy": "2020-10-12T17:43:08.752621Z",
     "iopub.status.idle": "2020-10-12T17:43:08.941014Z",
     "shell.execute_reply": "2020-10-12T17:43:08.941496Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket = MiniRocket()  # by default, MiniRocket uses ~10,000 kernels\n",
    "minirocket.fit(X_train)\n",
    "X_train_transform = minirocket.transform(X_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Fit a Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We suggest using `RidgeClassifierCV` (scikit-learn) for smaller datasets (fewer than ~10,000 training examples), and using logistic regression trained using stochastic gradient descent for larger datasets.\n",
    "\n",
    "**Note**: For larger datasets, this means integrating MiniRocket with stochastic gradient descent such that the transform is performed per minibatch, *not* simply substituting `RidgeClassifierCV` for, e.g., `LogisticRegression`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:08.993410Z",
     "iopub.status.busy": "2020-10-12T17:43:08.947187Z",
     "iopub.status.idle": "2020-10-12T17:43:09.066548Z",
     "shell.execute_reply": "2020-10-12T17:43:09.067299Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]),\n",
       "                  normalize=True)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)\n",
    "classifier.fit(X_train_transform, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 Load and Transform the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:09.071414Z",
     "iopub.status.busy": "2020-10-12T17:43:09.070666Z",
     "iopub.status.idle": "2020-10-12T17:43:09.931075Z",
     "shell.execute_reply": "2020-10-12T17:43:09.931598Z"
    }
   },
   "outputs": [],
   "source": [
    "X_test, y_test = load_arrow_head(split=\"test\", return_X_y=True)\n",
    "X_test_transform = minirocket.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:09.935232Z",
     "iopub.status.busy": "2020-10-12T17:43:09.934675Z",
     "iopub.status.idle": "2020-10-12T17:43:10.031071Z",
     "shell.execute_reply": "2020-10-12T17:43:10.031624Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8514285714285714"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "classifier.score(X_test_transform, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## 2 Multivariate Time Series\n",
    "\n",
    "We can use the multivariate version of MiniRocket for multivariate time series input.\n",
    "\n",
    "### 2.1 Imports\n",
    "\n",
    "Import MiniRocketMultivariate.\n",
    "\n",
    "**Note**: MiniRocketMultivariate compiles via Numba on import."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (above)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Load the Training Data\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:10.054652Z",
     "iopub.status.busy": "2020-10-12T17:43:10.034190Z",
     "iopub.status.idle": "2020-10-12T17:43:10.394311Z",
     "shell.execute_reply": "2020-10-12T17:43:10.394905Z"
    }
   },
   "outputs": [],
   "source": [
    "X_train, y_train = load_basic_motions(split=\"train\", return_X_y=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Initialise MiniRocket and Transform the Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:10.410718Z",
     "iopub.status.busy": "2020-10-12T17:43:10.410103Z",
     "iopub.status.idle": "2020-10-12T17:43:11.186318Z",
     "shell.execute_reply": "2020-10-12T17:43:11.186801Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket_multi = MiniRocketMultivariate()\n",
    "minirocket_multi.fit(X_train)\n",
    "X_train_transform = minirocket_multi.transform(X_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 Fit a Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:11.190556Z",
     "iopub.status.busy": "2020-10-12T17:43:11.190017Z",
     "iopub.status.idle": "2020-10-12T17:43:11.396461Z",
     "shell.execute_reply": "2020-10-12T17:43:11.397135Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]),\n",
       "                  normalize=True)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)\n",
    "classifier.fit(X_train_transform, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 Load and Transform the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:11.401025Z",
     "iopub.status.busy": "2020-10-12T17:43:11.400273Z",
     "iopub.status.idle": "2020-10-12T17:43:12.450777Z",
     "shell.execute_reply": "2020-10-12T17:43:12.451162Z"
    }
   },
   "outputs": [],
   "source": [
    "X_test, y_test = load_basic_motions(split=\"test\", return_X_y=True)\n",
    "X_test_transform = minirocket_multi.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.6 Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.494679Z",
     "iopub.status.busy": "2020-10-12T17:43:12.453795Z",
     "iopub.status.idle": "2020-10-12T17:43:12.548017Z",
     "shell.execute_reply": "2020-10-12T17:43:12.548575Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "classifier.score(X_test_transform, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## 3 Pipeline Example\n",
    "\n",
    "We can use MiniRocket together with `RidgeClassifierCV` (or another classifier) in a pipeline.  We can then use the pipeline like a self-contained classifier, with a single call to `fit`, and without having to separately transform the data, etc.\n",
    "\n",
    "### 3.1 Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (above)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Initialise the Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.552186Z",
     "iopub.status.busy": "2020-10-12T17:43:12.551660Z",
     "iopub.status.idle": "2020-10-12T17:43:12.553415Z",
     "shell.execute_reply": "2020-10-12T17:43:12.553966Z"
    }
   },
   "outputs": [],
   "source": [
    "minirocket_pipeline = make_pipeline(\n",
    "    MiniRocket(), RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Load and Fit the Training Data\n",
    "\n",
    "**Note**: Input time series must be *at least* of length 9.  Pad shorter time series using, e.g., `PaddingTransformer` (`sktime.transformers.panel.padder`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.557100Z",
     "iopub.status.busy": "2020-10-12T17:43:12.556478Z",
     "iopub.status.idle": "2020-10-12T17:43:12.885951Z",
     "shell.execute_reply": "2020-10-12T17:43:12.886625Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pipeline(steps=[('minirocket', MiniRocket()),\n",
       "                ('ridgeclassifiercv',\n",
       "                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]),\n",
       "                                   normalize=True))])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train, y_train = load_arrow_head(split=\"train\", return_X_y=True)\n",
    "\n",
    "# it is necessary to pass y_train to the pipeline\n",
    "# y_train is not used for the transform, but it is used by the classifier\n",
    "minirocket_pipeline.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 Load and Classify the Test Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-12T17:43:12.890535Z",
     "iopub.status.busy": "2020-10-12T17:43:12.889866Z",
     "iopub.status.idle": "2020-10-12T17:43:13.897048Z",
     "shell.execute_reply": "2020-10-12T17:43:13.897624Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8685714285714285"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test, y_test = load_arrow_head(split=\"test\", return_X_y=True)\n",
    "\n",
    "minirocket_pipeline.score(X_test, y_test)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
